The dataset includes: - ride_id: Unique identifier for
each ride. - rideable_type: Type of bike used. -
started_at and ended_at: Start and end times
of each trip. - start_station_name and
end_station_name: Start and end stations of the trip. -
start_lat and start_lng, end_lat
and end_lng: Latitude and longitude of start and end
points. - member_casual: Indicates whether the rider is a
“member” or “casual”.
# Check for missing values
colSums(is.na(cleaned_dec_23_tripdata))
## X ride_id rideable_type started_at
## 0 0 0 0
## ended_at start_station_name start_station_id end_station_name
## 0 0 0 0
## end_station_id start_lat start_lng end_lat
## 0 0 0 0
## end_lng member_casual trip_duration day_of_week
## 0 0 0 0
## hour_of_day
## 0
# Remove rows with missing values
cleaned_dec_23_tripdata <- na.omit(cleaned_dec_23_tripdata)
# Convert date/time fields to proper format
cleaned_dec_23_tripdata$started_at <- as.POSIXct(cleaned_dec_23_tripdata$started_at, format = "%Y-%m-%d %H:%M:%S")
cleaned_dec_23_tripdata$ended_at <- as.POSIXct(cleaned_dec_23_tripdata$ended_at, format = "%Y-%m-%d %H:%M:%S")
# Remove duplicate rows using dplyr
cleaned_dec_23_tripdata <- distinct(cleaned_dec_23_tripdata)
# Calculate trip duration in minutes
cleaned_dec_23_tripdata$trip_duration <- as.numeric(difftime(
cleaned_dec_23_tripdata$ended_at,
cleaned_dec_23_tripdata$started_at,
units = "mins"
))
# Extract the day of the week
cleaned_dec_23_tripdata$day_of_week <- weekdays(cleaned_dec_23_tripdata$started_at)
# Extract the hour from the start time
cleaned_dec_23_tripdata$hour_of_day <- as.numeric(format(cleaned_dec_23_tripdata$started_at, "%H"))
##
## casual member
## 36686 130457
## member_casual trip_duration
## 1 casual 16.53441
## 2 member 10.80303
## member_casual day_of_week count
## 1 casual Saturday 6931
## 2 casual Friday 6678
## 3 casual Thursday 5414
## 4 casual Sunday 5313
## 5 casual Wednesday 5027
## 6 casual Tuesday 3786
## 7 casual Monday 3537
#### Identify popular start stations for casual riders
##
## casual member
## classic_bike 20280 84044
## electric_bike 16406 46413
## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_bin()`).
#### Heat map: peak Hours by User Type
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_tile()`).